Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

نویسنده

  • Markus Dumke
چکیده

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments suggest that the new Q(σ, λ) algorithm can outperform the classical TD control methods Sarsa(λ), Q(λ) and Q(σ).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism

This paper develops an adaptive control method for controlling frequency and voltage of an islanded mini/micro grid (M/µG) using reinforcement learning method. Reinforcement learning (RL) is one of the branches of the machine learning, which is the main solution method of Markov decision process (MDPs). Among the several solution methods of RL, the Q-learning method is used for solving RL in th...

متن کامل

Two Novel On-policy Reinforcement Learning Algorithms based on TD(lambda)-methods

This paper describes two novel on-policy reinforcement learning algorithms, named QV(λ)-learning and the actor critic learning automaton (ACLA). Both algorithms learn a state value-function using TD(λ)-methods. The difference between the algorithms is that QV-learning uses the learned value function and a form of Q-learning to learn Q-values, whereas ACLA uses the value function and a learning ...

متن کامل

Coexistence and criticality in size-asymmetric hard-core electrolytes

Liquid-vapor coexistence curves and critical parameters for hard-core 1:1 electrolyte models with diameter ratios lambda = sigma(-)/sigma(+) = 1 to 5.7 have been studied by fine-discretization Monte Carlo methods. Normalizing via the length scale sigma(+/-) = 1 / 2(sigma(+)+sigma(-)), relevant for the low densities in question, both T(*)(c) ( = k(B)T(c)sigma(+/-)/q(2)) and rho(*)(c) ( = rho(c)s...

متن کامل

Q Learning based Reinforcement Learning Approach to Bipedal Walking Control

Reinforcement learning has been active research area not only in machine learning but also in control engineering, operation research and robotics in recent years. It is a model free learning control method that can solve Markov decision problems. Q-learning is an incremental dynamic programming procedure that determines the optimal policy in a step-by-step manner. It is an online procedure for...

متن کامل

Lie ternary $(sigma,tau,xi)$--derivations on Banach ternary algebras

Let $A$ be a Banach ternary algebra over a scalar field $Bbb R$ or $Bbb C$ and $X$ be a ternary Banach $A$--module. Let $sigma,tau$ and $xi$ be linear mappings on $A$, a linear mapping $D:(A,[~]_A)to (X,[~]_X)$ is called a Lie ternary $(sigma,tau,xi)$--derivation, if $$D([a,b,c])=[[D(a)bc]_X]_{(sigma,tau,xi)}-[[D(c)ba]_X]_{(sigma,tau,xi)}$$ for all $a,b,cin A$, where $[abc]_{(sigma,tau,xi)}=ata...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017